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Key comparisons carried out by the 
Consultative Committees (CCs) of the 
International Committee of Weights 
and Measures (CIPM) or the Bureau 
International des Poids et Mesures (BIPM) 
are referred to as CIPM key comparisons. 
The outputs of a statistical analysis of 
the data from a CIPM key comparison 
are the key comparison reference value, 
the degrees of equivalence, and their 
associated uncertainties. The BIPM 
publications do not discuss statistical 
interpretation of these outputs. We discuss 
their interpretation under the following 
three statistical models: nonexistent 
laboratory-effects model, random 



laboratory-effects model, and systematic 
laboratory-effects model. 
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1, Introduction 

Key comparisons are interlaboratory comparisons 
that serve as technical bases for Mutual Recognition 
Arrangements (MRA) between national metrology 
institutes (NMIs) [1]. Key comparisons carried out by 
the Consultative Committees (CCs) of the International 
Committee of Weights and Measures (CIPM) or the 
Bureau International des Poids et Mesures (BIPM) are 
referred to as CIPM key comparisons. Key compar- 
isons carried out by regional metrology organizations 
(RMO) are referred to as RMO key comparisons. The 
guidelines for carrying out CIPM key comparisons are 
given in reference [2]. 

The objectives of a CIPM key comparison are 
described in reference [1]. We consider two interpreta- 
tions of these objectives. A common interpretation is 
summarized by Nielsen [3] as follows: "The purpose 



of measurement intercomparisons between NMIs is to 
test, whether measurements performed in the partici- 
pating countries are consistent taking into account the 
uncertainties assigned to the measurements. If an 
inconsistency is detected, the participating countries 
should take the corrective actions needed to obtain 
consistency Otherwise, measurement results exchanged 
across borders cannot be considered equivalent without 
adding a 'between countries uncertainty,' which would 
be in disharmony with the concept of the SI system of 
units." 

This paper is based on a second interpretation of the 
objectives of a CIPM key comparison: Generally, 
the participants of a CIPM key comparison are NMIs 
that are members of the appropriate Consultative 
Committee; at least some of these NMIs provide 
realizations of the SI values to establish the traceability 
of measurements made in their countries. The purpose 
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of a CIPM key comparison is to establish the key com- 
parison reference value 1 , the degrees of equivalence 2 , 
and their associated uncertainties on the basis of the 
data provided by the participants. 

This paper is limited to a simple CIPM key compar- 
ison where the common measurand is a physical quan- 
tity of stable value during the comparison. Many CIPM 
key comparisons are not simple because it is often 
impractical or impossible to realize exactly the same 
measurand for or by all participants. We use the symbol 
Y for the stable value of the measurand. The data 
provided by the participants of a simple CIPM key 
comparison are paired results and standard uncertain- 
ties [x i9 u(xj\, . . ., \x n9 u(x n )], where the results x l9 . . ., x n 
are measurements of Y. The outputs of a statistical 
analysis of these data are the key comparison reference 
value jcr, the degree of equivalence d t = x- t - x R of the 
result x i9 the degree of equivalence d i} = d t - d j = x t - Xj 
of the results x t and x p and their associated standard 
uncertainties w(xr), w(4)> and u(d t J) 9 respectively for 
i,j= 1,2, . . ., n and i^j\\\ The key comparison refer- 
ence value x R is an estimate forZ An estimate for 7 is a 
combined result of measurement determined from the 
data [x l9 u{x x % ,,, 9 [x m u(x n )l 

An understanding of the difference between sam- 
pling probability distributions, used in classical 
(frequentist) statistics, and state-of-knowledge proba- 
bility distributions, used in Bayesian statistics, is neces- 
sary for proper analysis and interpretation of the data 
from a key comparison. Briefly, they are defined as 
follows. In classical statistics, the value of the measur- 
and is assumed to be an unknown constant, often called 
the true value, and each result of measurement is 
regarded as a realization of a random variable with a 
sampling distribution. A sampling distribution is a 
probability distribution that describes the relative fre- 
quencies of occurrence for all possible results of meas- 
urement when the conditions of measurement are 
hypothesized to be fixed at the intended levels [4]. The 
metrologist relates the expected values of the sampling 
distributions for the results of measurement to the value 
of the measurand. A classical (frequentist) statistical 



"Key comparison reference value: the reference value accompa- 
nied by its uncertainty resulting from a CIPM key comparison [1]." 
2 "Degree of equivalence of a measurement standard: the degree to 
which the value of a measurement standard is consistent with the key 
comparison reference value. This is expressed quantitatively by the 
deviation from the key comparison reference value and the uncer- 
tainty of this deviation. The degree of equivalence between two 
measurement standards is expressed as the difference between their 
respective deviations from the key comparison reference value and 
the uncertainty of this difference [1]." 



interpretation is a statement that relates the realized 
measurements to what one might expect if the key 
comparison could be repeated infinitely many times 
and throughout these repetitions the hypothesized 
sampling distributions continued to apply. 

In Bayesian statistics, the measurement data are 
given constants and the value of the measurand is a 
random variable. A probability distribution for 
the value of the measurand is a state-of-knowledge 
distribution that describes the degrees of belief for all 
possible values that could be attributed to the measur- 
and [4]. The belief is based on all available information 
including current results of measurement and scientific 
judgment based on prior and other data. Similar state- 
of-knowledge distributions apply to the other para- 
meters involved in assessing the value of the measur- 
and. A Bayesian interpretation is a statement that repre- 
sents the state-of-knowledge about the value of the 
measurand based on state-of-knowledge distributions 
before measurements are made and a likelihood 
function conditional on the current measurements [4]. 
The ISO Guide [5] is consistent with a Bayesian inter- 
pretation of measurements but not with a classical 
(frequentist) interpretation [4], 

We refer to the results x h ...,x n as laboratory results. 
The laboratory results x l9 ...,x n are regarded as reali- 
zations of random variables x l9 ...,x„ with sampling 
distributions 3 . We use the symbols X l9 ... 9 X n for the 
expected values ^(x^, ..., E(x n ) of the sampling distri- 
butions of x l9 ... 9 x n9 respectively. We refer to the 
expected values X l9 ...,X n as the laboratory expected 
values. We use the symbols a l9 ..., o n for the standard 
deviations S (x^, . . . , S (x n ) of the sampling distribu- 
tions ofx ls ...,x fl , respectively. Here S(x^ is the square 
root of the variance V(x^) =E[x i -E(x^)f of the samp- 
ling distribution of x 7 for / = 1,2, . . ., n. The uncertain- 
ties u{x^) 9 . . ., u(x n ) are statistical estimates of o l9 . . ., <r m 
respectively. 

References [1] and [2] do not discuss statistical 
interpretations of the pairs [Xr, u(x r )] 9 [d h u(d^] 9 and 
\d ip u(dij)]. A statistical analysis of the data from a key 
comparison and interpretation of its outputs requires 
assumptions and models about the relationship between 
the data [jc 1s u(xj\, . . ., [x n9 u(xj\ and the value 7 of the 
measurand. In Sec. 2, we discuss two assumptions, 
labeled as Assumption I and Assumption II 9 about the 
relationship between the laboratory expected values 
X u ... 9 X n and Y. Then we discuss two classical statis- 
tics models, a nonexistent laboratory-effects model 



We use the symbols x h ..., x n for both the random variables and 
their realized values. 
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and a random laboratory-effects model, based on 
Assumption I Next, we propose a systematic laboratory- 
effects model based on Assumption II We describe the 
key comparison reference value, the degrees of equiva- 
lence, and their associated uncertainties determined 
by each of the three statistical models. In Sec. 3 and 
4, we discuss statistical interpretations of the pairs 
[Xr> w(jCr)] 9 \d b «(gQ], and \d ip u{d t ])\ under the three 
statistical models. Our conclusion is given in 
Sec. 5. 

2. Statistical Assumptions and Models for 
the Relationship Between the Data and 
the Value of the Measurand 

In this section, we discuss statistical assumptions and 
models for analyzing the data from a simple CIPM key 
comparison to determine the key comparison reference 
value, the degrees of equivalence, and their associated 
uncertainties. 

2.1 Assumptions About the Relationship Between 
the Laboratory Expected Values and the 
Value of the Measurand 

One may either assume that the laboratory expected 
values X x , ...,X n are all equal or allow for the possibil- 
ity that X x , ...,X n may not be equal. 

Assumption I: The expected values X x , ...,X n are all 
equal. The Assumption I defined so far does not 
specify the relationship between the results x Y , ...,x n 
and Y. Therefore, in concert with Assumption I, it is 
generally assumed that the common expected value is 
equal to Y 9 i.e., X l = . . . = X n = Y. Under Assumption I, 
the results x 1 ,...,x n are subject to intralaboratory 
variations only. 

Assumption II: The expected values X u ...,X n may 
not be equal, i.e., X t - ^X } for some i,j=l,2,...,n and 
i^j. Therefore, not all of X l9 ...,X n may equal the 
value Y of the measurand. The Assumption II defined 
so far does not specify the relationship between the 
results jc 1s ..., x n and Y. Therefore, in concert with 
Assumption II, it is generally assumed that Y is either 
somewhere in the range of results x x , ...,x n or in the 



vicinity of this range 4 [6]. Under Assumption II, 
the results x l9 ...,x n are subject to both the intra- 
laboratory variations represented by the uncertainties 
u (X]), ..., u (x n ) and the interlaboratory variation 
arising from the dispersion of X x , . . . , X n about Y. The 
differences (X 1 - Y), ...,{X n -Y) are laboratory-effects 
(biases) due to unrecognized sources of error, denoted 
by bi, ...,£„, in the results jc 1s ...,x n . The biases are 
common to all measurements in a particular laboratory 
but may be different for different laboratories. 

2.2 Assumption About the Uncertainties 
Submitted by the Participants 

The standard uncertainties u(X]), . . . , u(x n ) submitted 
by the participants of a key comparison are estimates 
obtained by combining various estimated components 
of uncertainty in determining the value Y of the measur- 
and. A combined standard uncertainty w(x 7 ) may be 
unreliable for various reasons. For example, a classical 
(frequentist) Type A component of w(x 7 ) calculated from 
a small number of independent measurements is 
unreliable 5 [5]. A Type A component of u(x^) based on 
unjustified statistical assumptions may be unreliable. A 
Type B component of u(xi) based on unreasonable 
state-of-knowledge distributions may be unreliable. A 
combined uncertainty u(x^) determined from an in- 
complete measurement equation may be an underesti- 
mate. The unreliability of estimated uncertainties 
u{x^) 9 ..., u(x n ) is a component of uncertainty in deter- 



If the expected value X { were equal to the value 7 of the measur- 
and, then according to the ISO Guide, the interval [x x ± 2w(x 1 )] 
would represent an approximate range of the plausible values of 7. 
Likewise, if X 2 were equal to 7 then the interval [x 2 ± 2u(x2)] would 
represent an approximate range of the plausible values of 7, and so 
on for X 3> Xfy . . ., X n . It follows from Assumption II that any one or 
more of the expected values X h ...,X n may be close to or equal to 
7; therefore, the total interval consisting of the union of intervals 
[x ( .± 2tt(jtj)], for /" = 1, 2, ..., n, represents an approximate range of 
the plausible values of 7. However, most metrologists assign greater 
belief-probability to the middle than to the ends of the total interval. 

The unreliability of a classical (frequentist) estimate of uncertainly 
arising from a small number of measurements is quantified by 
degrees of freedom [5]. 
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mining the key comparison reference value x R , the 
degrees of equivalence d t and d ip and their associated 
standard uncertainties. In this paper, we do not discuss 
the additional uncertainty that arises from the unrelia- 
bility of u(X]), . . ., u(x n ). 

Classical (frequentist) statistical analyses and inter- 
pretations discussed in this paper are based on the 
assumption that the estimated uncertainties wfo), . .., 
u(x n ) are equal to the true standard deviations G u . . ., a n 
of the sampling distributions of jc 19 ...,*„, respectively 
Most metrologists make this assumption. For example, 
the expression u(x w ) = l/V[X f m>J for the standard 
deviation of the weighted mean Xw = 'L i w i x i /I* i w i , 
where Wi=\lu\x^ for z'= 1,2, ...,«, requires this 
assumption. 

Statistical analyses based on the ISO Guide regard a 
laboratory expected value X t as a variable with a state- 
of-knowledge distribution having expected value x t and 
standard deviation w(x 7 ). Such analyses require the 
assumption that the estimated uncertainties u(x^), ..., 
u(x n ) are sufficiently reliable. 

2.3 Classical (Frequentist) Statistics Models 
Based on Assumption I 

The weighted mean x w = £,- w t x, / £,- w l and the 
expression u(x w ) = 1/V^-tvJ, where W;= l/u\x^ for 
i = 1, 2, ..., n, are often used as the key comparison 
reference value x R and its associated standard uncer- 
tainty u(x R ), respectively. The use of x w as x R and u(x w ) 
as u(x R ) is based on the following classical (frequentist) 
statistics model. 

2.3.1 Nonexistent Laboratory-Effects Model 

The results are regarded as realizations of the 
random variables x u . . ., x m where 



x t =Y + e t , 



(1) 



and e t = (x 7 - Y) is the error in x t for i = 1, 2, ..., n. In 
this model, the parameter Y is identified with the value 
■T Ihi 1 eIuilwuuuilI uriJ tw *.i"njt» * ^ * ug. ii. ^ hi n I k 
independently distributed random variables with 
sampling distributions. The sampling distributions of 
e l ,...,e„ are generally assumed to be normal (Gaussian). 
The expected values of e u ...,e n are assumed to be 
zero and the variances of e l9 ..., e„ are assumed to be 
u 2 (x{), ..., u 2 (x n ), respectively. Under model (1) [repre- 
sented by Eq. (1)], the expected value E(x^ is equal 
to Y and the variance V(xi) is equal to w 2 (x 7 ), for 
i = 1, 2, ..., 77. Since the expected values of all results 
are equal to Y, the model (1) is based on Assumption I. 
In model (1), the results x u ...,x n are free of laboratory- 



effects (biases). Therefore, we refer to it as a nonexist- 
ent laboratory-effects model. The best least-squares 
estimate for the parameter Y of the nonexistent 
laboratory-effects model (1) is the weighted mean 
x w = E; Wi X[ I £,- w- t , where w- t = 1 /w 2 (x 7 ) for / = 1 , 2, . . . , n . 
The term best least-squares estimate 6 means that the 
estimate x w has the smallest variance among all esti- 
mates of Y that are both linear functions of the results 
x 1; ...,x„ and have the expected value Y. The standard 
deviation of the sampling distribution of x w is 
u(x w ) = l/VP/TV,]. Thus the key comparison reference 
value x R based on model (1) is x w and m(jCr) is m(jc w ). 
The corresponding degrees of equivalence are 
4 = ^/-^w an d d ij = x i -x p for i,j=l, 2, ...,/? and 
/ ^j. The uncertainties w(4) and u(d t J) are determined 
from the sampling distributions of jc 19 ...,x tt and x R 
under model (1). 

TVote 1: When not all uncertainties w(x L ), ..., u(x n ) are 
sufficiently reliable estimates of the true standard 
deviations a^ . . . , cr tt , the true standard deviation of the 
sampling distribution of the weighted mean x w may be 
larger than the true standard deviation of the sampling 
distribution of the arithmetic mean x A , Thus in this 
case the weighted mean x w may be an inferior key 
comparison reference value to the arithmetic mean x A . 

2.3.2 Random Laboratory-Effects Model 

The classical statistics model based on Assumption I 
for the situation where the dispersion of results jc 1s ..., 
x n may be more than what can reasonably be attributed 
to the intralaboratory variances u\x^), . . . , u 2 (x n ) is as 
follows. The results are regarded as realizations of the 
random variables jc 1s . . ., x n , where 



= Y + b t +e t , 



(2) 



bi ={X i -Y) is the laboratory effect (bias) in x t and 
e f = (Xi - JQ is the intralaboratory error in x t for / = 1,2, 
w Ha -cirtijtw ol ^icitktin jittuitif:tkiiu- lu Sil»U ilu 1 
results x l9 ... 9 x n to 7 are as follows: the laboratory 
biases bi,...,b n are regarded as random variables 
having the same normal sampling distribution with 
expected value zero and variance cr b 2 > 0, called inter- 
laboratory variance', and b l9 ...,b n are assumed to be 
mutually independent and independent of the errors e ls 
...,e n . The model (2) [represented by Eq. (2)] with 
these assumptions is referred to as a random laboratory- 



' Least-squares estimation does not require that the errors e h ..., e n 
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effects model [7]. Here the term random means that the 
biases b u ...,b n are regarded as random variables with 
the same sampling distribution that is assumed to be 
normal with expected value zero and variance cr b 2 . 
Under the random laboratory-effects model (2), the 
expected value E(x^) is equal to Y and the variance F(x 7 ) 
is equal to o b 2 + u 2 (x^) for i = 1, 2, . .., n. The non- 
existent laboratory-effects model (1) is a special case of 
the random laboratory-effects model (2) where cr b 2 = 0, 
which means that the biases b h ...,b n are all zero, i.e., 
X { = ...=X n =Y 

A popular estimate for the parameter Y of model (2) 
is the weighted mean 7 x w = £,- w t x t I £,- w t , where 
w i = 1/[A 2 + u\x^j\ and s b 2 is an estimate for cr b 2 . 
Reference [8] discusses various methods for determin- 
ing s b . The estimate s b inflates each of the intralabora- 
tory variances u\x^), . . ., u 2 (x n ) just enough to account 
for the dispersion of results jc 1s ...,x n that is not 
accounted for by model (1). Under the assumption that 
the estimated variances s b 2 + u\x^), . . . , s b + u 2 (x n ) are 
regarded as the true variances of the sampling distribu- 
tions of x x , . . ., x n , the best estimate of the parameter Y 
of model (2) is the weighted mean x w and the standard 
uncertainty associated with x w is u(x w ) = 1/Vp^-wJ, 
where w= \/[s b 2 + w 2 <>/)] for i = 1,2, ...,« [9], [8]. 
Thus the key comparison reference value x R based 
on model (2) is the weighted mean x w = £,- w t x t i £,- w l 
and uncertainty m(jCr) is w(x w ) = 1/V[S 7 ivj, where 
w i = M[ s b 2 + u \ x i)] f° r z = 1, 2, . . ., n. The corresponding 
degrees of equivalence are d l = x i -x R = x i -x w and 
d ij = x i - x p for i,j = 1, 2, ..., n and / ^j. The uncer- 
tainties associated with the degrees of equivalence are 
determined from the sampling distributions of x v , ...,x n 
and x R under model (2). 

The advantage of model (2) relative to model (1) is 
that it allows for the possibility that the dispersion 
of results x u . . ., x n may be more than what can reason- 
ably be attributed to the intralaboratory variances 
w 2 (Xi), ..., u\x n ). When the dispersion of x x , ...,x n is 
not more than what can reasonably be attributed to 
w 2 (Xi), ..., u\x n ), the estimate s b is zero. In that case, 
model (2) yields the same x R and u(x R ) as model (1). 
Therefore, there is no disadvantage to using model (2) 
in place of model (1). 

The random laboratory-effects model (2) of classi- 
cal statistics is conceptually faulty for the analysis of a 
CIPM key comparison for the following reasons. First, 



7 We did not introduce a new symbol for the weighted mean deter- 
mined from model (2) because model (1) is a special case of model (2). 



the participants of a CIPM key comparison are specific 
NMI laboratories rather than randomly chosen from a 
large population of laboratories. Therefore, the biases 
b v , ...,b n may not be regarded as random variables with 
the same sampling distribution. Second, the assumption 
that the sampling distribution of the biases b v , ...,b n is 
a normal distribution with expected value zero may not 
be justified. The next section introduces a new model 
that does not assume that the biases b h ...,b n are 
random variables with a normal sampling distribution. 

2.4 A Model Based on Assumption II and 
the ISO Guide 

A statistical analysis of the data from a simple CIPM 
key comparison based on Assumption //requires one to 
account for the uncertainty that arises from the 
unknown bias in a combined result of measurement that 
is used as an estimate for Y. Before publication of the 
ISO Guide, there was no generally accepted approach 
to account for the uncertainty that arises from an 
unknown bias. The approach proposed by the ISO 
Guide to account for the uncertainty that arises from an 
unknown bias is now generally accepted. So we have 
used the ISO Guide to develop the following syste- 
matic laboratory-effects model. 

2.4.1 Systematic Laboratory-Effects Model 

We start with a combined result of the form £/ a,^, 
where £,- a = 1 , that is used as an initial estimate for Y. 
This estimate requires the assumption that Y is within 
the range of results x 1; ...,x n . We refer to the initial 
estimate as the uncorrected combined result (UCR) 
and denote it by x VCK = 'L i a i x i . If a l = w t i X,- w b 
then x UCR is the weighted mean x w = I* t w t x l i S 7 w- 
where w t = l/u 2 (x^) for i = 1, 2, ..., n. If a t = \ln 
for / = 1,2, ...,«, then x UCR is the arithmetic mean 
x A = I* t xj n. Let X UCR = E ( . a t X l be the expected value of 
the sampling distribution of x UCR . According to 
Assumption II, the result x UCR is subject to the bias 
O^ucr ~ ^)- The ISO Guide recommends that the result 
x UCR should be corrected to counter its possible bias and 
the uncertainty associated with the correction should be 
included in the combined standard uncertainty associat- 
ed with the corrected result. The bias (X UCR - Y) is an 
unknown constant but the correction for bias, denoted 
by C, is a variable with a state-of-knowledge proba- 
bility distribution. If the expected value and standard 
deviation of a state-of-knowledge probability distribu- 
tion for the correction variable C are denoted by c and 
u(c), respectively then the correction applied to the 
result x UCR to counter its possible bias is c and the 
standard uncertainty associated with the correction is u(c). 
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In order to specify a state-of-knowledge probability 
distribution for the correction variable C, the labora- 
tory expected values X u ... 9 X n and the value Y of the 
measurand are regarded as variables with state- 
of-knowledge distributions and the data x l9 ... 9 x n and 
u(X]), . .., u(x n ) are regarded as given constants. A 
state-of-knowledge distribution for X l represents the 
state of knowledge about the value Y of the measurand 
in the laboratory labeled i for i= 1, 2, . .., n. The 
expected value E(X^) and standard deviation S(X^) of the 
variable X t are assumed to be x t and u(x^ 9 respectively 
for i = 1, 2, . . ., n [5], [4]. It follows that X UCR = X f a^ 
is a variable with a state-of-knowledge prob- 
ability distribution. The expected value of X VCR is 
£(^ucr) = 2/ a, £(J*0 = E f a, x, = x UCR . In the expression 
(7-X UCR ) for the negative of bias, treated as a variable, 
we replace X VCR with its expected value x UCR . Then a 
probability distribution for C represents belief about the 
possible values of (Y-x VCR ) 9 where x UCR is a constant 
and Y is the variable. The belief about possible values 
of Y is based on all available information including 
results of measurement and scientific judgment. In 
reference [6], we proposed a triangular distribution for 
the correction variable C, with peak at and default 
limits [x (1) - x UCR ] = min {x 1 - x UCR , . . . , x„ - x UCR } and 

L X (ri) ~ X \JCRJ = m ^X V*i — -*UCRj> • • • s X n ~ X \JCR) • A CntlClSItl 

of the proposed triangular distribution with default 
limits is that it is determined by the extreme results 
x (1) = min{x b ...,*„} and x {n) = max{x l9 ...,x„}, which 
are sometimes suspected to be in error. 

Here, we propose a discrete-equal-probability distri- 
bution that is determined by all of the results x l9 ... 9 x n . 
The results x l9 . . ., x n are plausible values of Y as deter- 
mined by competent laboratories. 8 So the known con- 
stant differences (x { - x UCR ), ...,(x n - x UCR ) are plausible 
values of (7-x UCR ). These differences are a statistical 
basis for specifying a probability distribution for C. 
Let Ci = x t - x UCR for i = 1 , 2, . . . , n. Suppose 
c 1; ...,c n are assigned probabilities p l9 . . ., p n . Then the 
expected value of C is c = E(C) = X i p i q = (X^- jc,-) - x UCR 
and the standard deviation of C is u(c) = S(C) = VfX^- 
( c i - c ) 2 ]- Frequently the available scientific knowledge 
is inadequate to assign different probabilities p l9 ... 9 p n 
to c l9 ... 9 c n . Therefore, we propose the discrete- 
equal-proba bility distribution for which jfo= \In for 



respectively where x A = X,- x l i n is the arithmetic mean 
of the results x l9 . . . , x n . 

A measurement equation is required to incorporate 
correction for possible bias in a combined result of 
measurement for Y. The measurement equation that 
corresponds to the bias (X VCR - Y) in the uncorrected 
combined result x UCR is Y = X VCR + C. This measure- 
ment equation is widely applicable in metrology [ 1 0] . It 
suggests the following model for the value Y of the 
measurand: 

E(X i ) = x n S(X i ) = u(x i ),X VCR =^a i X n Y=X lJCR +C 9 

(3) 
where a h . . . , a„ are constants such that X,- a t = 1 . In this 
model, X x , ...,X m X VCR , C, and Y are variables with 
state-of-knowledge distributions. The expected value 
and standard deviation of X t are the given constants 
x t and wfo), respectively, for i := 1, 2, ..., n. A state- 
of-knowledge distribution for the correction variable 
C is defined independently of the state-of-knowledge 
distributions for the variables X u ... 9 X m after the latter 
have been specified. In particular, X VCR and C are inde- 
pendently distributed. We refer to model (3) [repre- 
sented by Eq, (3)] as a systematic laboratory-effects 
model to distinguish it from the random laboratory- 
effects model (2) that regards the biases (systematic 
errors) b l9 ...,b n as random variables having the same 
sampling distribution with expected value zero. 
Suppose the standard deviation of the variable X UCR is 
S(Xjj CR ) = u(xjj CR ). Then the corrected combined result 
for Y determined from the systematic laboratory-effects 
model (3) is y = x UCR + c and its associated standard 
uncertainty is u(y) = V[w 2 (x UCR ) + u\c)]. 

The systematic laboratory-effects model (3) allows 
for the possibility that not all pairs of the variables 
X u ..., X n may be independently distributed. The 
variance K(X UCR ) = w 2 (x UCR ) is determined from the 
variances and covariances of the variables X u ... 9 X n . 
When the distributions ofX 1; . . ., X n are independent and 
X VCR is the weighed mean X w = X i w i X i /X l w i9 where 
w t = l/V(X t ) = l/w 2 (jt f ) for i = 1, 2, ..., «, then 
u\x VCR ) = V(X W ) = l/[2, W/ ] = 1/jX, [\iu\x^}. 
When the distributions ofX 1; . . ., X n are independent and 
Xjj CR is the arithmetic mean X A = 'L i X i / n, then 
uH^) = V(X,) = {iin 1 ^ WQ = i\fn 2 fe u\xf. 



i = 1, 2, ..., n. The expected value and standard devia- 
tion of C based on discrete-equal-probability distribu- 



In order to specify c and w(c), one is free to use any 
reasonable distribution for C, based on scientific judg- 



As noted in the footnote of Sec. 2.1, the total interval consisting of 
the union of intervals [x;± 2u(x$\ 9 for i = 1, 2, ..., «, represents an 
approximate range of the plausible values of F. 



Since the harmonic mean of positive numbers is less than or equal 
to their arithmetic mean, V(X W ) < \\X£. When « 2 (Xi), . . . , w 2 (x tt ) 
are equal, K(X W ) = V(X A ). 
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is used, c = x A 



and u(c) = V[E; (jt; - Xpflri] . In 



that case, the result of measurement for Y is 
y = %cr + c = x UCR + x A - x UCR = x A and u(y) = V[w 2 (x UCR ) 
+ u\c)], where u{c) = V[E,- (x,- - x^flri] . 

Following the ISO Guide, the result j; and uncertainty 
u(y) determined from the systematic laboratory-effects 
model (3) are interpreted as the expected value and stan- 
dard deviation of a state-of-knowledge distribution for 
the values that could reasonably be attributed to Abased 
on the data x 1; ...,x tt and ufa), . .., u(x n ) [5], [4], [6]. 
Thus the key comparison reference value x R based on 
the systematic laboratory-effects model (3) is y and 
uncertainty w(xr) is uiy). The corresponding degrees of 
equivalence are d x = x t -y and d u = x t - Xj for ij =1,2, 
...,n and i^j. The uncertainties u(di) and u(d t J) are 
determined from state-of-knowledge distributions for 
the variables X x , ...,X n and Y. 

3. Interpretation of the Key Comparison 
Reference Value and Its Associated 
Uncertainty 

3.1 Classical Statistics Models Based on Assumption I 

The nonexistent laboratory-effects model and the 
random laboratory-effects model are based on classical 
(frequentist) statistics. In particular, the results x l9 . . ., x n 
are regarded as realizations of random variables with 
sampling distributions and Y is an unknown constant. 
Therefore, the key comparison reference value x R is a 
realization of a random variable with a sampling distri- 
bution that has expected value Y and standard deviation 
w(jCr) = w(jc w ) = 1/Vp, w t ]. In the nonexistent laboratory- 
effects model Wi is \iu\x^ and in the random laboratory- 
effects model w t is \/[s h 2 + u 2 (x^)] for i = 1, 2, ...,/?. 
The interval [x r ±2w(xr)] determined from a classical 
statistics model is a confidence interval for Y computed 
from the data jc 15 . . ., x n and u{x^ . . ., u(x n ). Imagine that 
the CIPM key comparison could be repeated infinitely 
many times in exactly the same conditions using exact- 
ly the same instruments and artifacts. Now imagine that 
throughout these repetitions exactly the same sampling 
distributions continued to apply to the random variables 
x 1; ...,x n . Then the confidence level is the fraction of 
the infinitely many hypothetical intervals, such as 
[x R ± 2«(x R )], that would include Y [4]. 

3.2 Systematic Laboratory-Effects Model Based 
on Assumption II 

The key comparison reference value x R and uncer- 
tainty u(x R ) determined from the systematic laboratory- 
effects model are given constants that represent the ex- 



pected value and standard deviation of a state-of- 
knowledge distribution for Y based on the data x 1; . . ., x n 
and ufa), ..., u(x n ). The interval [x r ±2w(xr)] deter- 
mined from the systematic laboratory-effects model is 
an expanded uncertainty interval for Y. The coverage 
probability (level of confidence) of the interval 
[x R ± 2«(jc r )] is the fraction of a state-of-knowledge dis- 
tribution for Y that is encompassed by this interval [4]. 

4. Interpretation of the Degrees of 
Equivalence and Their Associated 
Uncertainties 

4.1 Classical Statistics Models Based on Assumption I 

In the random laboratory-effects model and its 
special case the nonexistent laboratory-effects model, 
the expected values of the sampling distributions of 
jc l9 ..., x Ri and x R are all equal to Y. Therefore, the 
expected values of the sampling distributions of all 
degrees of equivalence d t = x t - x K and d i} = jc f - Xj are 
zero, for ij= 1,2, . . ., n and i ^j. This implies that all 
computed degrees of equivalence, whether small or 
large, are statistical estimates of zero. In particular, 
according to these models, all degrees of equivalence 
published in the key comparison database (KCDB) 
[11] are estimates of zero. 

4.2 Systematic Laboratory-Effects Model Based 
on Assumption II 

In the systematic laboratory-effects model, the 
results jc 1s . . ., x n are the expected values and the uncer- 
tainties u(xi), . . . , u(x n ) are the standard deviations of 
state-of-knowledge distributions for the laboratory 
expected values X u ... 9 X n , treated as variables. It fol- 
lows that the degree of equivalence d t = x t -x R = x i -y 
is the expected value of a state-of-knowledge distribu- 
tion for the laboratory effect (bias) X l -Yfori= 1,2, ..., n, 
and the degree of equivalence d u = x 7 - Xj is the expect- 
ed value of a state-of-knowledge distribution for the 
difference X t -Xj for ij : = 1, 2, ..., n and i^j. The 
uncertainty u(di) is the standard deviation 10 of X i \- Y 
and the uncertainty w(4j) is the standard deviation of 
X t - X p for i,j = 1, 2, . . ., n and / ^j. 



The standard deviation of X t -Y depends on the covariance 
between JQ and 7for/=l,2, ...,n. Since Y=X^ QR + C = Y i a i X i + C 
and the variable C is distributed independently of the variables X h 
..., X n , the covariances C(JQ, Y),fori= 1,2, ..., n, can be determined 
from the variances and covariances of X l9 ..., X n . Then u{d^ = 
\ViXt- Y)\ where the variance ¥(%- Y) is equal to VQQ + V(Y) - 
2xC(X h Y). 
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5. Conclusion 

We addressed a simple CIPM key comparison where 
the common measurand is a physical quantity of stable 
value during the comparison. We discussed statistical 
interpretation of the key comparison reference value, 
the degrees of equivalence, and their associated uncer- 
tainties determined from the following three statistical 
models: nonexistent laboratory-effects model, random 
laboratory-effects model, and systematic laboratory- 
effects model. The first two models are based on 
classical (frequentist) interpretation of measurements. 
The systematic laboratory-effects model is based on 
Bayesian interpretation of measurements. 

The key comparison reference value x K and uncer- 
tainty m(jCr) determined from the systematic laboratory- 
effects model represent the expected value and standard 
deviation of a state-of-knowledge distribution for the 
value Y of the measurand. Therefore their statistical 
interpretation agrees with the ISO Guide. According to 
the systematic laboratory-effects model, the degree of 
equivalence d t and uncertainty u(di) are, respectively, 
the expected value and standard deviation of a state-of- 
knowledge distribution for the laboratory effect (bias) 
X i -Y,fori= 1,2, . . ., n, and the degree of equivalence 
d u and uncertainty u{d t J) are, respectively, the expect- 
ed value and standard deviation of a state-of-knowledge 
distribution for the difference X t -X p for /, j = 1, 2, . . ., n 
and i ^j. Thus the degrees of equivalence determined 
from the systematic laboratory-effects model quantitate 
the agreements and disagreements of laboratory results. 
Therefore, the systematic laboratory-effects model is 
suitable for the data analysis of a simple CIPM key 
comparison. 
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